First I can iterate over all pred_df.csv files in the Output folder
and extract the observed effect,
average effect, and WT effect which can be
used for the transformations, producing the merged file.
| observed_effect | average_effect | wt_effect | unique_id |
|---|---|---|---|
| 0.000000 | -1.585059 | 0.000000 | AP_catef_1 |
| -2.924453 | -2.827816 | -2.924453 | AP_catef_1 |
| -4.943095 | -5.610975 | -4.943095 | AP_catef_1 |
| -5.623423 | -6.853733 | -7.867548 | AP_catef_1 |
| -2.352617 | -1.596688 | -2.352617 | AP_catef_1 |
| -3.156145 | -2.839445 | -5.277070 | AP_catef_1 |
I fit cubic splines with penalty towards monotonicity by varying degrees ranging from 1-5. Then, I used AIC to select the strongest model for each landscape, and log the number of degrees for the upcoming transform.
| Unique_ID | Best_Degree | Best_AIC |
|---|---|---|
| AP_catef_1 | 4 | 79.375299 |
| DHFR_ic50_c57 | 1 | 22.838106 |
| DHFR_ic50_c58 | 1 | 29.992922 |
| DHFR_ic50_c59 | 4 | 18.441344 |
| DHFR_ic50_c60 | 5 | 24.490984 |
| DHFR_ic50_c61 | 5 | 9.666351 |
| DHFR_ic75_palmer | 5 | 197.681499 |
| DHFR_kcat_trajg | 4 | 6.681135 |
| DHFR_kcat_trajr | 5 | -3.673020 |
| DHFR_ki_trajg | 5 | 18.610525 |
| DHFR_ki_trajr | 4 | 6.720748 |
| MPH_catact_CaPTM | 1 | 35.963443 |
| MPH_catact_CdPTM | 1 | -11.111711 |
| MPH_catact_CoPTM | 1 | 24.892223 |
| MPH_catact_CuPTM | 1 | 23.676399 |
| MPH_catact_MgPTM | 1 | 26.680008 |
| MPH_catact_MnPTM | 1 | 11.392128 |
| MPH_catact_NiPTM | 1 | 4.306652 |
| MPH_catact_ZnPTM | 4 | 46.094248 |
| NfsA_ec50_2039 | 5 | -101.318022 |
| NfsA_ec50_3637 | 5 | -65.972370 |
| OXA-48_ic50_CAZtraj1 | 5 | -32.759419 |
| OXA-48_ic50_CAZtraj2 | 4 | -26.712493 |
| OXA-48_ic50_CAZtraj3 | 1 | -99.376569 |
| OXA-48_ic50_PIPtraj1 | 5 | 6.216836 |
| OXA-48_ic50_PIPtraj2 | 5 | 28.085051 |
| OXA-48_ic50_PIPtraj3 | 4 | -20.227528 |
| PTE_catact_2NH | 5 | 54.866422 |
| PTE_catact_butyrate | 5 | -15.140782 |
| TEM_MIC_weinreich | 5 | 39.737284 |
| TEM_growth_AM | 4 | 13.245353 |
| TEM_growth_AMC | 4 | 7.064258 |
| TEM_growth_AMP | 5 | 42.425104 |
| TEM_growth_CAZ | 1 | 40.127210 |
| TEM_growth_CEC | 1 | 39.052728 |
| TEM_growth_CPD | 5 | 32.778424 |
| TEM_growth_CPR | 1 | 30.292993 |
| TEM_growth_CRO | 5 | 40.952662 |
| TEM_growth_CTT | 1 | 45.810669 |
| TEM_growth_CTX | 5 | 36.284359 |
| TEM_growth_CXM | 4 | 23.141110 |
| TEM_growth_FEP | 1 | 26.596062 |
| TEM_growth_SAM | 5 | 22.150169 |
| TEM_growth_TZP | 5 | 21.309300 |
| TEM_growth_ZOX | 5 | 23.862908 |
I am then able to plot what the transformation function looks like (in blue) for each landscape with the x-axix representing additive effects, and y axis representing observed effects.
We can also inspect the transformed data by plotting scatterplots against the previous observed effects in order to gauge how the data will be transformed for each landscape.
## Warning in bs(average_effect, degree = 4L, knots = numeric(0), Boundary.knots =
## c(-10.1510461496875, : some 'x' values beyond boundary knots may cause
## ill-conditioned bases
## Warning in bs(average_effect, degree = 1L, knots = numeric(0), Boundary.knots =
## c(-0.3763691616875, : some 'x' values beyond boundary knots may cause
## ill-conditioned bases
## Warning in bs(average_effect, degree = 1L, knots = numeric(0), Boundary.knots =
## c(-0.669048522875, : some 'x' values beyond boundary knots may cause
## ill-conditioned bases
## Warning in bs(average_effect, degree = 4L, knots = numeric(0), Boundary.knots =
## c(-0.6773956788125, : some 'x' values beyond boundary knots may cause
## ill-conditioned bases
## Warning in bs(average_effect, degree = 5L, knots = numeric(0), Boundary.knots =
## c(-0.0561651719375001, : some 'x' values beyond boundary knots may cause
## ill-conditioned bases
## Warning in bs(average_effect, degree = 5L, knots = numeric(0), Boundary.knots =
## c(0.1505184723125, : some 'x' values beyond boundary knots may cause
## ill-conditioned bases
## Warning in bs(average_effect, degree = 5L, knots = numeric(0), Boundary.knots =
## c(-0.175174974098038, : some 'x' values beyond boundary knots may cause
## ill-conditioned bases
## Warning in bs(average_effect, degree = 4L, knots = numeric(0), Boundary.knots =
## c(-2.0188274643125, : some 'x' values beyond boundary knots may cause
## ill-conditioned bases
## Warning in bs(average_effect, degree = 5L, knots = numeric(0), Boundary.knots =
## c(-1.6236400111875, : some 'x' values beyond boundary knots may cause
## ill-conditioned bases
## Warning in bs(average_effect, degree = 5L, knots = numeric(0), Boundary.knots =
## c(-0.692822294062501, : some 'x' values beyond boundary knots may cause
## ill-conditioned bases
## Warning in bs(average_effect, degree = 4L, knots = numeric(0), Boundary.knots =
## c(-0.564818362125001, : some 'x' values beyond boundary knots may cause
## ill-conditioned bases
## Warning in bs(average_effect, degree = 1L, knots = numeric(0), Boundary.knots =
## c(-0.473301722907051, : some 'x' values beyond boundary knots may cause
## ill-conditioned bases
## Warning in bs(average_effect, degree = 1L, knots = numeric(0), Boundary.knots =
## c(0.142740045279341, : some 'x' values beyond boundary knots may cause
## ill-conditioned bases
## Warning in bs(average_effect, degree = 1L, knots = numeric(0), Boundary.knots =
## c(-0.714866858571236, : some 'x' values beyond boundary knots may cause
## ill-conditioned bases
## Warning in bs(average_effect, degree = 1L, knots = numeric(0), Boundary.knots =
## c(-0.256493158789162, : some 'x' values beyond boundary knots may cause
## ill-conditioned bases
## Warning in bs(average_effect, degree = 1L, knots = numeric(0), Boundary.knots =
## c(-0.525714893344278, : some 'x' values beyond boundary knots may cause
## ill-conditioned bases
## Warning in bs(average_effect, degree = 1L, knots = numeric(0), Boundary.knots =
## c(-0.65621332985174, : some 'x' values beyond boundary knots may cause
## ill-conditioned bases
## Warning in bs(average_effect, degree = 1L, knots = numeric(0), Boundary.knots =
## c(-0.456627754334221, : some 'x' values beyond boundary knots may cause
## ill-conditioned bases
## Warning in bs(average_effect, degree = 4L, knots = numeric(0), Boundary.knots =
## c(-0.687987888097287, : some 'x' values beyond boundary knots may cause
## ill-conditioned bases
## Warning in bs(average_effect, degree = 5L, knots = numeric(0), Boundary.knots =
## c(-0.511775567217363, : some 'x' values beyond boundary knots may cause
## ill-conditioned bases
## Warning in bs(average_effect, degree = 5L, knots = numeric(0), Boundary.knots =
## c(-0.33614802335173, : some 'x' values beyond boundary knots may cause
## ill-conditioned bases
## Warning in bs(average_effect, degree = 5L, knots = numeric(0), Boundary.knots =
## c(-0.28127227342755, : some 'x' values beyond boundary knots may cause
## ill-conditioned bases
## Warning in bs(average_effect, degree = 4L, knots = numeric(0), Boundary.knots =
## c(-0.282094961196305, : some 'x' values beyond boundary knots may cause
## ill-conditioned bases
## Warning in bs(average_effect, degree = 1L, knots = numeric(0), Boundary.knots =
## c(-0.160628161189058, : some 'x' values beyond boundary knots may cause
## ill-conditioned bases
## Warning in bs(average_effect, degree = 5L, knots = numeric(0), Boundary.knots =
## c(-2.25266045395833, : some 'x' values beyond boundary knots may cause
## ill-conditioned bases
## Warning in bs(average_effect, degree = 5L, knots = numeric(0), Boundary.knots =
## c(-2.31931248227604, : some 'x' values beyond boundary knots may cause
## ill-conditioned bases
## Warning in bs(average_effect, degree = 4L, knots = numeric(0), Boundary.knots =
## c(-1.99818581863791, : some 'x' values beyond boundary knots may cause
## ill-conditioned bases
## Warning in bs(average_effect, degree = 5L, knots = numeric(0), Boundary.knots =
## c(-0.188262458006133, : some 'x' values beyond boundary knots may cause
## ill-conditioned bases
## Warning in bs(average_effect, degree = 5L, knots = numeric(0), Boundary.knots =
## c(-0.591142196043501, : some 'x' values beyond boundary knots may cause
## ill-conditioned bases
## Warning in bs(average_effect, degree = 5L, knots = numeric(0), Boundary.knots =
## c(-0.751685262437498, : some 'x' values beyond boundary knots may cause
## ill-conditioned bases
## Warning in bs(average_effect, degree = 4L, knots = numeric(0), Boundary.knots =
## c(-0.616625, : some 'x' values beyond boundary knots may cause ill-conditioned
## bases
## Warning in bs(average_effect, degree = 4L, knots = numeric(0), Boundary.knots =
## c(-0.728625, : some 'x' values beyond boundary knots may cause ill-conditioned
## bases
## Warning in bs(average_effect, degree = 5L, knots = numeric(0), Boundary.knots =
## c(-1.11625, : some 'x' values beyond boundary knots may cause ill-conditioned
## bases
## Warning in bs(average_effect, degree = 1L, knots = numeric(0), Boundary.knots =
## c(-0.77925, : some 'x' values beyond boundary knots may cause ill-conditioned
## bases
## Warning in bs(average_effect, degree = 1L, knots = numeric(0), Boundary.knots =
## c(-1.9028125, : some 'x' values beyond boundary knots may cause ill-conditioned
## bases
## Warning in bs(average_effect, degree = 5L, knots = numeric(0), Boundary.knots =
## c(-0.428875, : some 'x' values beyond boundary knots may cause ill-conditioned
## bases
## Warning in bs(average_effect, degree = 1L, knots = numeric(0), Boundary.knots =
## c(-1.5428125, : some 'x' values beyond boundary knots may cause ill-conditioned
## bases
## Warning in bs(average_effect, degree = 5L, knots = numeric(0), Boundary.knots =
## c(-0.637625, : some 'x' values beyond boundary knots may cause ill-conditioned
## bases
## Warning in bs(average_effect, degree = 1L, knots = numeric(0), Boundary.knots =
## c(-0.204875, : some 'x' values beyond boundary knots may cause ill-conditioned
## bases
## Warning in bs(average_effect, degree = 5L, knots = numeric(0), Boundary.knots =
## c(-0.3609375, : some 'x' values beyond boundary knots may cause ill-conditioned
## bases
## Warning in bs(average_effect, degree = 4L, knots = numeric(0), Boundary.knots =
## c(-0.6940625, : some 'x' values beyond boundary knots may cause ill-conditioned
## bases
## Warning in bs(average_effect, degree = 1L, knots = numeric(0), Boundary.knots =
## c(-0.32075, : some 'x' values beyond boundary knots may cause ill-conditioned
## bases
## Warning in bs(average_effect, degree = 5L, knots = numeric(0), Boundary.knots =
## c(-1.473875, : some 'x' values beyond boundary knots may cause ill-conditioned
## bases
## Warning in bs(average_effect, degree = 5L, knots = numeric(0), Boundary.knots =
## c(-0.3038125, : some 'x' values beyond boundary knots may cause ill-conditioned
## bases
The drawback of using monotonic splies (or even power transforms), as can be seen from the plots above, is the lack of bounding. Bounding is crucial to avoid greatly transforming phenotype values that lay outside of the bounds of the transform, which itself is constrained by the range of the predicted first-order effects.
In other words, phenotypes with magnitudes that lay outside of the range of predicted values by the first-order model are at risk of being incorrectly transformed. Though this is somewhat true for a four-parameter model, the bounded upper- and lower- thresholds ensure that phenotypes that lay outside of the predicted range will be approximated more accurately, i.e., the transformation will be lower in magnitude than other models.
Here we attempt to transform all datasets using the four-paramter function, with fitting performed by non-linear least squares regression using nlsLM. The fits are compared to simple linear models using an AIC to determine whether the additional information stemming from the four-parameter transform is parsimonous, otherwise, no transform is applied.
## Error in nlsModel(formula, mf, start, wts) :
## singular gradient matrix at initial parameter estimates
## Error in nlsModel(formula, mf, start, wts) :
## singular gradient matrix at initial parameter estimates
Indeed, these fits appear much better visually, and we also avoid overfitting to datasets by simply utilizing the linear model. We can then use this analysis to transform and export these new data.
Upon performing transformations, it is also useful to see how the R2 value improves after transformations. Provided is the table of the pre- and post- transformation R2 values for a linear regression.
Here we need to re-import the data with the genotype column, and also get a list of columns that start with “p”, i.e., for AP we need to get p101, p166, p153, p322, and p328. We can scrub the p’s and save this as a vector.
The output csv files should have the following format (an example for AP):
| WT | ||||
|---|---|---|---|---|
| WT-seq | ||||
| Positions | ||||
| p101 | p166 | p153 | p322 | p328 |
| Genotype | Function | |||
| DRDEK | 0 |
and so on…
## Error in nlsModel(formula, mf, start, wts) :
## singular gradient matrix at initial parameter estimates
## Error in nlsModel(formula, mf, start, wts) :
## singular gradient matrix at initial parameter estimates
After the export is done, the data can be re-analyzed using the other scripts in this directory.